Spectral normalization employing hidden Markov modeling of line spectrum pair frequencies
نویسندگان
چکیده
This paper proposes a spectral normalization approach in which the acoustical qualities of an input speech waveform are mapped onto that of a desired neutral voice. Such a method can be e ective in reducing the impact of speaker variability such as accent, stress, and emotion for speech recognition. In the proposed method, the transformation is performed by modeling the temporal characteristics of the Line Spectrum Pair (LSP) frequencies of the neutral voice using hidden Markov models. The overall approach is integrated into a pitch synchronous overlap and add (PSOLA) analysis/synthesis framework. The algorithm is objectively evaluated using a distance measure based on the log-likelihood of observing the input (or normalized input) speech given Gaussian mixture speaker models for both the input and desired neutral voice. Results using the Gaussian mixture model formulated criteria demonstrate consistent normalization using a 10 speaker database.
منابع مشابه
Tue.O5d.04 Considering Global Variance of the Log Power Spectrum Derived from Mel-Cepstrum in HMM-based Parametric Speech Synthesis
This paper utilizes global variance (GV) of the log power spectrum (LPS) derived from mel-cepstrum to improve hidden Markov model (HMM) based parametric speech synthesis. In order to alleviate over-smoothing of the generated spectral structures, an LPS-GV modeling method using line spectral pairs (LSPs) has been proposed in our previous work, where the estimated distribution of LPS-GV was combi...
متن کاملConsidering Global Variance of the Log Power Spectrum Derived from Mel-Cepstrum in HMM-based Parametric Speech Synthesis
This paper utilizes global variance (GV) of the log power spectrum (LPS) derived from mel-cepstrum to improve hidden Markov model (HMM) based parametric speech synthesis. In order to alleviate over-smoothing of the generated spectral structures, an LPS-GV modeling method using line spectral pairs (LSPs) has been proposed in our previous work, where the estimated distribution of LPS-GV was combi...
متن کاملPrediction of Voice Aperiodicity Based on Spectral Representations in HMM Speech Synthesis
In hidden Markov model-based speech synthesis, speech is typically parameterized using source-filter decomposition. A widely used analysis/synthesis framework, STRAIGHT, decomposes the speech waveform into a framewise spectral envelope and a mixed mode excitation signal. Inclusion of an aperiodicity measure in the model enables synthesis also for signals that are not purely voiced or unvoiced. ...
متن کاملAn HMM-Based Mandarin Chinese Text-To-Speech System
In this paper we present our Hidden Markov Model (HMM)-based, Mandarin Chinese Text-to-Speech (TTS) system. Mandarin Chinese or Putonghua, “the common spoken language”, is a tone language where each of the 400 plus base syllables can have up to 5 different lexical tone patterns. Their segmental and supra-segmental information is first modeled by 3 corresponding HMMs, including: (1) spectral env...
متن کاملAn improved model of masking effects for robust speech recognition system
Performance of an automatic speech recognition system drops dramatically in the presence of background noise unlike the human auditory system which is more adept at noisy speech recognition. This paper proposes a novel auditory modeling algorithm which is integrated into the feature extraction front-end for Hidden Markov Model (HMM). The proposed algorithm is named LTFC which simulates properti...
متن کامل